The genomes of 5 underutilized Papilionoideae crops provide insights into root nodulation and disease resistance

Abstract Background The Papilionoideae subfamily contains a large amount of underutilized legume crops, which are important for food security and human sustainability. However, the lack of genomic resources has hindered the breeding and utilization of these crops. Results Here, we present chromosome-level reference genomes for 5 underutilized diploid Papilionoideae crops: sword bean (Canavalia gladiata), scarlet runner bean (Phaseolus coccineus), winged bean (Psophocarpus tetragonolobus), smooth rattlebox (Crotalaria pallida), and butterfly pea (Clitoria ternatea), with assembled genome sizes of 0.62 Gb, 0.59 Gb, 0.71 Gb, 1.22 Gb, and 1.72 Gb, respectively. We found that the long period of higher long terminal repeat retrotransposon activity is the major reason that the genome size of smooth rattlebox and butterfly pea is enlarged. Additionally, there have been no recent whole-genome duplication (WGD) events in these 5 species except for the shared papilionoid-specific WGD event (∼55 million years ago). Then, we identified 5,328 and 10,434 species-specific genes between scarlet runner bean and common bean, respectively, which may be responsible for their phenotypic and functional differences and species-specific functions. Furthermore, we identified the key genes involved in root-nodule symbiosis (RNS) in all 5 species and found that the NIN gene was duplicated in the early Papilionoideae ancestor, followed by the loss of 1 gene copy in smooth rattlebox and butterfly pea lineages. Last, we identified the resistance (R) genes for plant defenses in these 5 species and characterized their evolutionary history. Conclusions In summary, this study provides chromosome-scale reference genomes for 3 grain and vegetable beans (sword bean, scarlet runner bean, winged bean), along with genomes for a green manure crop (smooth rattlebox) and a food dyeing crop (butterfly pea). These genomes are crucial for studying phylogenetic history, unraveling nitrogen-fixing RNS evolution, and advancing plant defense research.

. Fig. 5 and Fig. 6, I did not observe any novel results.The comparisons among these five species cannot tell the unique gene copies, because the other legume species are not included.Reply: Thanks for the reviewer's suggestion.We agree that Fig. 5 and Fig. 6 do not have significant scientific findings.Based on the reference genomes, we identified all the key genes involved in nitrogen-fixing root nodule symbiosis and resistance (R) gene-mediated defense for the 5 species in this study, providing a valuable resource for future functional studies.Then, we performed some simple and shallow evolutionary analyses for these genes.At current stage, we do not believe we can get solid conclusion by just adding more published legume species into the analyses, as we are lacking of accurate phenotype information.Finally, according to the editor's suggestion, we agree to change the article type of this manuscript into "Data Note".
Reviewer #2: The manuscript titled "The Genomes of Five Underutilized Papilionoideae Crops Provide Insights into Root Nodulation and Disease Resistance" describes the assembly of reference genomes for five Papilionoideae species.Although the genomic data generated in this study is relevant to advance our understanding of the genomic mechanisms that control adaptation in the five species used, I have some comments that could help to improve the manuscript before its Introduction: Page 3: Replace subjective descriptions such as "the young bean pods are delicious vegetables" with more objective language.Reply: this sentence was revised to "the young bean pods are used as vegetables".
The introduction does not mention the importance of disease resistance genes.Which references have compared resistance genes in legumes, and how can this help in breeding for species of this family?Reply: In the revised manuscript, the first paragraph of "Evolutionary History of Resistance Genes" was moved to the introduction.
How were the lines that were sequenced selected?Are they representative of a specific marker class or gene pool?Reply: We selected the lines of highly homozygous cultivars that were widely grown in China.
Page 8: In the section "TE sequences which are too difficult," remove "too".Reply: Corrected.Figure 5A does not present results and should be moved to the introduction.Reply: We agree with the reviewer that Figure 5A do not present results.However, we do not want to show any figure in the introduction.We thought that put Figure 5A and 5B-D together will help integrate information.To make it clear, we added a sentence in Figure 5A legend "The data source is from previous studies,".
Figure 5B: Why are there differences across species?P. coccineous has double the number of CHS genes compared to C. pallida; what does this mean?Reply: CHS is key gene for isoflavone biosynthesis.The gene number difference may influence the amount of isoflavone production in different species.However, we do not have accurate phenotype information, so we can't make further conclusion.We added a discussing sentence in the revised manuscript "The difference in CHS gene number may influence the amount of isoflavone production in different species, which needs further investigations in future.".Page 9: Define "NFN".Reply: Definition "NFN (Nitrogen-fixing Nodulation)" was provided in the revised manuscript.
Page 11: The entire first paragraph of "Evolutionary History of Resistance Genes" should be moved to the introduction.Reply: According to the reviewer's suggestion, we moved this paragraph into introduction.

Page 12:
Clarify the coverage of the five reference genomes that were assembled.What constitutes "Low-coverage" for the common bean reference genome?Confirm whether the latest reference genome was used (https://phytozomenext.jgi.doe.gov/info/Pvulgaris_v2_1), which was created using PacBio with a coverage of 83X.If the latest reference genome of the common bean was not used, the analysis needs to be repeated.Reply: Here the coverage means the ratio of assembled sequence in the whole genome.We have updated the analyses with the latest reference genome of the common bean, and largely revised the manuscript: "We performed comparative genomic studies between the two Phaseolus plants scarlet runner bean (P.coccineus) and common bean (P.vulgaris) [7], which diverged from each other ~4.7 MYA (Figure 3A).Overall, all chromosomes from the two species have one-versus-one corresponding relationships, with only some intra-chromosome inversions (Figure 4A).The estimated genome sizes of the two species were both ~590 Mb.Our assembly size of scarlet runner bean is 593 Mb, almost equal to the estimated genome size.Meanwhile, the latest assembly size of common bean (NCBI: GCA_029448765.1) is 615 Mb, which is a little larger than the estimated genome size.Looking into the TEs, we found that 63.5% of the scarlet runner bean genome is composed of TEs, a little lower than that of the common bean genome (65%) (Figure 4B).Furthermore, the 35,523 protein-coding genes of scarlet runner bean were compared to the 42,801 protein-coding genes of common bean, which was annotated by our pipeline in this study.Requiring an alignment E-value of 1e-5, 30,195 (85.0%) scarlet runner bean genes and 32,367 (75.6%) common bean genes were matched, leaving 5,328 (15.0%) unique genes in scarlet runner and 10,434 (24.4%) unique genes in common bean (Figure 4C, Supplementary Table S18), which may be responsible for the phenotypic differences and species-specific functions between the two species." Modify "we found a huge advantage of HiFi-assembly" to "we found an advantage of HiFi-assembly".Reply: In the revised manuscript, this sentence was deleted.
Discuss whether the difference in the number of unique genes between scarlet runner bean and common bean is due to sequencing advantages or actual genomic differences.Reply: When updated the analyses with the latest assembly of common bean, we found that in the old analyses, much of the difference in the number of unique genes should be due to the sequencing advantages, but not actual genomic differences.So we removed our previous conclusion, and largely revised the manuscript.
Elaborate on how "These genomic resources will promote evolutionary and comparative genomics studies in Fabaceae".Reply: This sentence was revised into "The genomic resources in this study expand the lineage coverage in published Fabaceae species, which will promote evolutionary and comparative genomics studies in Fabaceae.".On the identification of key genes in root nodulation for the five studied plants, is this not an expected finding since all are nitrogen-fixing legumes?Is there a difference in efficiency?If so, is there an enrichment of particular genes, more copies, or greater gene expression in the more efficient species?Reply: Yes, all the sequenced 5 plants are nitrogen-fixing legumes.As we do not have accurate phenotype information on nitrogen-fixing efficiency, so we can't make further conclusions.With the annotated genes for each species, we performed gene copies analyses, and found that sword bean, scarlet runner bean and winged bean have one more NIN gene copy than smooth rattlebox and butterfly pea.Moreover, scarlet runner bean and winged bean each have two times more copies of CHS genes than the other 3 plants.These results have been described in the Result part.As this study focus on genome level analyses, so gene expression analyses for functional genes were not performed.
Correct "was detected for all the studies species" to "was detected for all the studied species".Reply: Corrected.Page 13: Remove "obviously" from "We also found that the CHS genes were obviously expanded through".Reply: Corrected.
The information on R genes for the five studied plants belongs in the results section.It would be useful to compare the R genes across the five species to understand similarities and differenfces.How can this information be used to develop more resistant cultivars?Reply: We have compared the gene number for each R gene class among the five species.However, we do not have accurate phenotype information on the disease resistance ability, so we can't make further inference between the gene copy number and disease resistance ability.In the revised manuscript, we added a discussing sentence "Considering that the 5 species in this study were much less domesticated than those well-cultivated legumes such as soybean, the 5 studied species here still have much stronger ability of disease resistance.Therefore, the identified R genes in these 5 species may be transported to major legume crops such as soybean to improve their ability of disease resistance.".Page 16: Correct the references for Papilionoideae species as they are incorrect in the sentence Reply: Corrected.
Reviewer #3: The manuscript "The genomes of five underutilized Papilionoideae crops provide insights into root nodulation and disease resistance", authored by Yuan et al., presents the chromosome-length reference genomes of five crops: sword bean, scarlet runner bean, winged bean, smooth rattlebox, and butterfly pea.The manuscript is largely well-written and covers most of the analyses required for genome assembly and comparative study.However, there are a few areas that could be addressed to improve the manuscript's quality.
-Based on the Hi-C contact matrix for winged bean, it appears that the scaffolding for several chromosomes is not supported by Hi-C data.Additionally, according to Table 1, the genome assembly size for this species is higher than the estimated genome size.It would be advisable to check for contaminants in the assembly or correct the misjoins based on the Hi-C contact matrix.Reply: For winged bean, the chromosome regions lacking Hi-C supports were repeatrich centromeric regions.In fact, there are enough Hi-C data sequenced for these regions, but due to the requirement of unique mapping, most Hi-C data were filtered for these repeat-rich regions.Thus, they can't be shown in the Hi-C heatmap.
In Table 1, the estimated genome size by K-mer method is 689 Mb, while the assembled total contig size is 712 Mb, there is only ~23 Mb (3%) difference.Considering that the variations of the K-mer genome size estimation is about 5%, our genome size estimation generally consists with the genome assembly size .Please refer to the GCE paper: "Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects.https://doi.org/10.48550/arXiv.1308.2012".
According to the reviewer's suggestion, we checked the assembly again.We did not find significant contaminants in our assembly, and we could not find any obvious misjoins based on Hi-C data.
-It appears the genome assembly for the Ma3 cultivar of winged bean was recently published (https://www.nature.com/articles/s41467-024-45048-x).Comparing the with Ma3 could support assembly quality and identify differences between the cultivars.Reply: The Ma3 cultivar assembly (586 Mb) used Oxford nanopore reads with 85%-90% single base accuracy, while our DUOXI-ginseng cultivar (712 Mb) used HiFi reads with 90-99.9%single base accuracy.In theory, HiFi data has great advantage in distinguishing repetitive sequences over Oxford nanopore reads, therefore, our assembly includes a more comprehensive assembly of the centromere sequences.To verify this thought, we downloaded the published Ma3 cultivar genome, and compared with our DUOXI-ginseng cultivar genome.From the macrosynteny figure shown as Supplementary Fig. S4, we found that the two assembly were largely consistent in synteny, except for the middle centromere regions.The Ma3 cultivar assembly missed most of the centromere sequences, in total ~120 Mb, which explains why Ma3 cultivar (586 Mb) has a much smaller assembly size than DUOXI-ginseng cultivar (712 Mb).
However, the Ma3 cultivar assembly also has advantage over our assembly.For Chr06 in our assembly, the position of one chromosome segment seems incorrect due to the lack the Hi-C signal, we have corrected this mistake by referring to the Ma3 cultivar assembly, which has used a genetic map for scaffolding.Therefore, integrating all these genomic resources will generate a better assembly of winged bean.
We added the results of comparing analyses in the revised manuscript: "In addition, we compared our DUOXI-ginseng cultivar assembly of winged bean (712 Mb) to the recently published Ma3 cultivar assembly (Figshare: https://doi.org/10.6084/m9.figshare.19196255) of winged bean (586 Mb), and found that our assembly has large advantage in resolving the repetitive centromere regions by using HiFi reads, while the Ma3 cultivar assembly has advantage in the chromosome-level scaffolding of contigs by using genetic map.The position of a chromosome segment in Chr06 of our assembly for winged bean was corrected according to the Ma3 cultivar assembly (Supplementary Fig. S4)." -The genome used for the common bean seems to be outdated.More complete genomes are available, for example, Carrere et al., 2023 (https://www.sciencedirect.com/science/article/pii/S2352340923003013).The authors should utilize these newer genomes for a better comparison or provide reasoning for using the older reference genomes.Reply: We have updated the analyses using the latest assembly of common bean.The updated results were shown in the revised manuscript: "We performed comparative genomic studies between the two Phaseolus plants scarlet runner bean (P.coccineus) and common bean (P.vulgaris) [7], which diverged from each other ~4.7 MYA (Figure 3A).Overall, all chromosomes from the two species have one-versus-one corresponding relationships, with only some intra-chromosome inversions (Figure 4A).The estimated genome sizes of the two species were both ~590 Mb [39].Our assembly size of scarlet runner bean is 593 Mb, almost equal to the estimated genome size.Meanwhile, the latest assembly size of common bean (NCBI: GCA_029448765.1) is 615 Mb, which is a little larger than the estimated genome size [7].Looking into the TEs, we found that 63.5% of the scarlet runner bean genome is composed of TEs, a little lower than that of the common bean genome (65.0%) (Figure 4B, Supplementary Table S18).
Furthermore, the 35,523 protein-coding genes of scarlet runner bean were compared to the 42,801 protein-coding genes of common bean, which was annotated by our pipeline in this study.Requiring an alignment E-value of 1e-5, 30,195 (85.0%) scarlet runner bean genes and 32,367 (75.6%) common bean genes were matched, leaving 5,328 (15.0%) unique genes in scarlet runner and 10,434 (24.4%) unique genes in common bean (Figure 4C, Supplementary Table S18), which may be responsible for the phenotypic differences and species-specific functions between the two species."-The expansion analysis is insufficient.The authors should conduct an unbiased contraction/expansion analysis using, for example, CAFE, and then identify significantly expanded/contracted gene families in the different species, rather than selectively citing gene families from the literature.This unbiased approach could help identify more gene families of interest that have not been reported before.Reply: We have shown the CAFÉ results in Supplementary Fig. S6, which shows no genome-wide gene burst on the branches of five species in this study, indicating that there is no recent whole genome duplication like soybean.To identify unbiased significantly expanded/contracted gene families in each species, the orthologous group results generated by OrthoFinder should be a good start.However, it is out of our interest to analyze more expanded or contracted gene families.According to the editor's suggestion, we decided to change the article type of this manuscript into "Data Note", which focuses on the primary analyses of the reference genomes.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
-Including the common names of all species in the figures and tables would be beneficial, as these are referred to in the text, making it easier for readers to match the text with the figures and tables.Reply: We thought it is more rigorous to use latin (scientific) names in the figures and tables.We have shown both the common names and latin (scientific) names for the first time appearance in the maintext, and then to be simple, we used the common names in the maintext, as the 5 species in this study are all people familiar trees.
-In the Introduction, the citation numbers for the genomes of the different crops are incorrect.Reply: Corrected.
-On page 5, in the second paragraph, including the exact numbers in the text is unnecessary as they are already in Table 1.Mentioning just the range or average numbers could be more effective.
-On page 7, the full form of PWGD should be provided.Reply: The full form "PWGD, papilionoid-specific WGD" was provided in the revised manuscript.
-Several typographical and grammatical errors need to be corrected to enhance the manuscript's readability.Reply: We have carefully checked and corrected the typographical and grammatical errors.

Editor comments:
The reviewers also mentioned (in their confidential comments to editor) that the story is not that attractive for the presence and minor copy variations of all the related genes, and the manuscript is more suitable for the GigaScience Data Note, rather than the regular research article.Therefore, we suggest you change the article type from "Research" to "Data Note".Reply: We agree to change the article type into "Data Note".respectively.We found that long period of higher LTR-RTs (Long terminal repeat-retrotransposons) activity is the major reason that enlarges the genome size of smooth rattlebox and butterfly pea., Additionally,and there have been is no recent Wwhole Genome Dduplication (WGD) events in these 5 species except for the shared papilionoid-specific WGD event (PWGD, ~55 MyaMYA).
Then, we identified 5,328 and 10,4346,710 and 669 species-specific unique genes between scarlet runner bean and common bean, for each speciesreapectively, which may be responsible for their phenotypic and functional differences and species-specific functions.Furthermore, we identified the key genes involved in root-nodule symbiosis (RNS) in all 5 species, and found that the NIN gene was duplicated in the early Papilionoideae ancestor, followed by the loss of one gene copy in smooth rattlebox and butterfly pea lineages.At last, we identified the resistance (R) genes for plant defenses in these 5 species, and characterized their evolutionary history.

Conclusions:
In summary, this study provides chromosome-scale reference genomes for three grain and vegetable beans (sword bean, scarlet runner bean, winged bean), along with genomes for a green manure crop (smooth rattlebox) and a food dyeing crop (butterfly pea).These genomes are

Introduction
Papilionoideae, the largest subfamily in Fabaceae (Legume) [1] whose name probably originated from its flower's resemblance to a butterfly (Latin: Papilio), has an extremely important position in agriculture and makes great contributions to the human diet and food security.In addition toBesides, the several well-known crops such as soybean [2], peanut [3], faba bean [4], mung bean [5], pea [6], common bean [7]and alfalfa [8], this subfamily also includes many other underutilized crops [9].For example, sword bean (Canavalia gladiata), scarlet runner bean (Phaseolus coccineus) and winged bean (Psophocarpus tetragonolobus) are both grain and vegetable plants: the mature bean seeds are protein-rich grains, while the young bean pods are delicious used as vegetables.Smooth rattlebox (Crotalaria pallida) and butterfly pea (Clitoria ternatea) are often used as green manure and forage grass, due to their high protein content.The dried flowers of butterfly pea are also used as a natural food colorant (blue), which is popular in Southeast Aisa countries [10].In addition, sword bean has also been used as a traditional medicine to improve poor appetite and alleviate vomiting in China for thousands of years [11], and scientists found that smooth rattlebox also has antitumor properties in recent years [12].
The Papilionoideae plants also play a unique ecological role in nitrogen fixation through symbiotic root nodules [13], which is indispensable for the global nitrogen cycle.The ability to nitrogen fixation from the atmosphere also helps agriculture production use fewer synthetic fertilizers, thereby reducing the -energy consumption and mitigating soil pollution [14].As a model of special host-bacteria interaction, the formation of nitrogen-fixing root nodules has been intensively studied on two model species Medicago truncatula and Lotus japonicus [15,16], both belonging to the subfamily Papilionoideae.The host plants excrete flavonoids into the rhizosphere, and induce the rhizobia to express the nodulation (nod) genes [13].Then, the metabolite products of these nod genes (Nod factors) are sensed by the host plants to start nodulation, which requires the coordination of rhizobial infection at the root epidermis with cell division in the cortex [17].
Inside the nodule, rhizobia live in an organelle-like structure known as symbiosome, and the host plants secrete leghemoglobin (Lb) to maintain a low-oxygen environment in order to facilitate the nitrogen-fixing reactions in symbiosomes [18].Recent phylogenomics and phylotranscriptomics studies have shown a single origin of nitrogen-fixing root-nodule symbiosis (RNS), and then multiple independent losses occurred in various lineages [19,20].
Resistance (R) gene-mediated defense plays an important role in plant defenses against all pathogens.It recognizes the pathogen-derived proteins referred to as effectors, and induces a state in the host defined as effector-triggered susceptibility (ETS), which in turn leads to the local hypersensitive response (HR) cell death to restrict pathogen growth and propagation [42].Most cloned R genes encode intracellular NLR receptors, which are typically composed of 3 domains: a central NB (NB-ARC) domain, bordered by a C-terminal leucine-rich repeat domain (LRR), and an N-terminal coiled-coil (CC) or Toll/ interleukin-1 receptor (TIR) or resistance to powdery mildew (RPW8) domain [35].The RPW8 domain is rare in comparison to the two major CC and TIR domains.
For the purpose of nodulation studies and crop breeding, tens of agriculturally important plants in the subfamily Papilionoideae have been sequenced, including all the above well-known species, as well as adzuki bean [21], lablab bean [22], velvet bean [23], kudzu vine [24], pagoda tree [25], and winged bean (Ma3 cultivar) [26] et al..However, the subfamily contains many other rare but valuable species, which still lack reference genomes, hindering the in-depth biological studies and exploitation of these species.Here, we present the chromosome-scale reference genomes for 3 grain and vegetable beans (sword bean, scarlet runner bean, winged bean), a green manure crop (smooth rattlebox), and a food dyeing crop (butterfly pea) to investigate the phylogenetic history, explore the evolution of RNS, and identify the resistance (R) genes involved in controlling crop diseases.

LTR-RTs activity influences the genome size
The highly continuous reference genomes enabled a comprehensive analysis of the transposable elements (TEs).In total, 55%, 63%, 64%, 82% and 86% of the genomes are composed of TEs for sword bean, scarlet runner bean, winged bean, smooth rattlebox, and butterfly pea, respectively (Figure 2A, Supplementary Table S12).Among all the TE types, LTR-RTs especially Gypsy-LRT is the most dominant TE type for all the 5 species.Notably, LTR-RTs activity is also the most contributing factor to the genome size (Figure 2B-2C, Supplementary Table S13-S14), which is consistent with previous reports for most plants [29].The LTR-RTs expansion period in smooth rattlebox and butterfly pea are much wider than the other 3 species, which may partially explain their relatively larger genome sizes.Interestingly, there is a very recent sharp explosion of LTR-RTs in scarlet runner bean, though its LTR-RTs activity is much lower in the long history period.On the contrary, there is a high LTR-RTs expansion in the old history period, but the LTR-RTs activity gets lower and lower in the recent history period in winged bean (Figure 2D).Taken together, these results suggest that the genome size of legume species have been changing in the evolution history along with the LTR-RTs expansions and extractions.
To investigate the whole genome duplication events, we calculated the Ks values of the paralogue pairs for each species.The distribution of Ks values showed a shared peak at around 0.6 for all the 5 species in this study as well as soybean (Figure 3B), consistent with previous reports that an ancient whole genome duplication event (PWGD, papilionoid-specific WGD) occurred at the origin of the papilionoid clade 55 Mya MYA [38].The large amounts of whole genome-wide syntenic fragments inside each species also confirms this inference (Supplementary Fig. S65-6, Supplementary Table S17).Unlike soybean which has a lineage-specific whole genome duplication event (G-LS) 13 MYA ya corresponding to Ks peak around 0.1 [2], all the 5 species in this study do not have any recent Ks peaks, indicating that they do not have recent whole genome duplication events.The gene expansion and contraction analyses along the phylogeny tree also showed no recent genome-wide gene burst on the branches for all the 5 species in this study (Supplementary Fig. S7) Although the chromosome numbers have changed among the 5 species, many large syntenic blocks were are still presentexistent, but -with multiple large-scale chromosome inversion and translocation events (Figure 3C).

Unique genes identified between P. coccineus and P. vulgaris
We performed comparative genomic studies between the two Phaseolus plants scarlet runner bean (P.coccineus) and common bean (P.vulgaris) [7], which diverged from each other ~4.7 Mya MYA (Figure 3A).Overall, all chromosomes from the two species have one-versus-one corresponding relationships, with only some intra-chromosome inversions (Figure 4A).The estimated genome sizes of the two species were both ~590 Mb.Our assembly size of scarlet runner bean is 593 Mb, almost equal to the estimated genome size.Meanwhile, the latest assembly size of common bean (NCBI: GCA_029448765.1) is 615 Mb, which is a little larger than the estimated genome size.Looking into the TEs, we found that 63.5% of the scarlet runner bean genome is composed of TEs, a little lower than that of the common bean genome (65%) (Figure 4B, Supplementary Table S18).Our assembly size of scarlet runner bean with HiFi reads is 593 Mb almost equal to the estimated genome size, however, the assembly size of common bean with Roche/Illumina data is only 532 Mb, missing ~58 Mb (10%) sequences.Looking into the TEs, we found that scarlet runner bean has 52 Mb more LTR-TEs than common bean which largely explains the missing components in the assembly of common bean.Being close to scarlet runner bean, common bean is also very likely to have a recent explosion of LTR-TEs (Figure 2D), resulting in many highly similar copies of TE sequences which are too difficult to assemble using short reads data.In contrast, our assembly of scarlet runner bean with ultralong HiFi data can successfully overcome this problem.
Furthermore, the 35,523 protein-coding genes of scarlet runner bean were compared to the 42,801 protein-coding genes of common bean, which was annotated by our pipeline in this study.thebetter assembly of scarlet runner bean enables an annotation of  4C, Supplementary Table S18), which may be responsible for the phenotypic differences and species-specific functions between the two species.Due to the poor assembly, common bean should in fact have more unique genes, which may be approximate to the number of unique genes in scarlet runner bean.

Expansion of NIN and CHS genes
Legume is special in plants for its nitrogen-fixing root-nodule symbiosis (RNS) [13], which requires a set of key genes (Figure 5A).The chalcone synthase (CHS) and isoflavone synthase protein 2), as well as RPG (Rhizobiumdirected polar growth).At lastThen, the Lb gene was expressed activated to produce leghemoglobin [39].We identified all these key genes in sword bean, scarlet runner bean, winged bean, smooth rattlebox, and butterfly pea (Figure 5B, Supplementary Table S19-S20), providing a valuable gene resource for RNS studies.
In contrast to other symbiosis-relevant genes involved in infection, NIN and RPG are only known to have NFN (Nitrogen-fixing Nodulation) symbiosis-specific functions, whereas the mutation of other genes may have more pleiotropic effects.The phylogenomics studies also found that both the NIN and RPG genes exist in root nodulating legumes, but absenceare absent or have becoame pseudogenes in non-nodulating legumes, indicating that NIN and RPG are the essential genes for root nodulation [19].In this study, we found that the RPG gene was single copy in each plant (Supplementary Fig. S87), but the NIN gene was duplicated in the early Papilionoideae ancestor, possibly as a result of the whole genome duplication event (PWGD) occurred 55 MyaMYA.One gene copy was retained in all the 5 plants, but the other gene copy was missing in smooth rattlebox and butterfly pea (Figure 5C, Supplementary Table S21).Therefore, sword bean, scarlet runner bean and winged bean, each have two copies of the NIN genes, while smooth rattlebox and butterfly pea have only one copy of the NIN gene for each plant.
Isoflavones, a type of polyphenolic secondary metabolite from the phenylalanine pathway in plants with a C6-C3-C6 structure, are predominantly distributed in plants of the Papilionoideae subfamily and are majorly used as the signaling molecules between leguminous plants and rhizobia [40,41].In this study, we identified all the gene copies of CHS and IFS (Figure 5D, scarlet runner bean, winged bean, smooth rattlebox, and butterfly pea, respectively (Figure 6B, Supplementary Fig. S9).In addition, only one R gene with the N-terminal RPW8 domain was identified in smooth rattlebox (Supplementary Fig. S109).Overall, scarlet runner bean has equal number of TNL and CNL genes, sword bean and winged bean have more CNL genes than TNL genes, while smooth rattlebox and butterfly pea have more TNL genes than CNL genes.Through phylogenetic analysis, all the TNL and CNL genes were separated into two major branches (Figure 6C), consistent with previous studies which shown that the three classes of NLR R-genes have evolved at the early origin of ancestral Angiosperm [43].Using Albizia julibrissin as an outgroup, which belongs to the second largest subfamily Caesalpinioideae in Fabaceae (Legume), we inferred 6 TNL orthologous groups (OGs) and 9 CNL OGs within the family Fabaceae, with each OG derived from a single gene in the common Fabaceae ancestor (Figure 6C).The 6 TNL and 9 CNL ancestral genes in Fabaceae, were derived from multiple whole genome polyploidization events or gene duplications since the born of Angiosperm.Considering that the 5 species in this study were much less domesticated than those well-cultivated legumes such as soybean, the 5 studied species here still have a much stronger ability of disease resistance.Therefore, the identified R genes in these 5 species may be transported to major legume crops such as soybean to improve their ability of disease resistance.

Discussion
In this study, we generated chromosome-level genome assemblies and high-quality gene annotations for 5 underutilized legume crops.The assembly quality is near telomere-to-telomere level, with only a few gaps in each constructed chromosome.Smooth rattlebox and butterfly pea have much larger genome sizes than sword bean, scarlet runner bean and winged bean, due to a relatively long period of LTR expansion in the evolutionary history of these two species.
Phylogeny and divergence time for the studied plants were inferred with the single-copy gene families, and a whole genome duplication event at the origin of the papilionoid clade (PWGD) around 55 Mya MYA was detected for all the studieds species.Moreover, we By comparing the HiFi-assembled scarlet runner bean and the low-coverage assembly of common bean derived from Roche/Illuimna data, we found huge advantage of HiFi-assembly in overcoming the difficulties of recently burst repetitive sequences, and identified 6,7105,328 unique genes in scarlet runner bean and only 10,434669 unique gene in common bean, which may be helpful in investigating the functional genes underling their phenotypic differences..These genomic resources in this study expand the lineage coverage in published Fabaceae species, which will promote evolutionary and comparative genomics studies in Fabaceae.
In comparison to the grains, most beans have much higher protein content, which is closely related with nitrogen-fixing root nodule symbiosis.Based on sequence homology, we identified all the key genes involved in root nodulation in the 5 studied plants.The NIN gene was duplicated in the early Papilionoideae ancestor, and then gene loss happened on one branch in smooth rattlebox and butterfly pea.We also found that the CHS genes were obviously expanded through local tandem duplication in scarlet runner bean and winged bean.Our results provide more genomic evidence for the evolution of nitrogen-fixing root nodule symbiosis (RNS), which will promote the molecular breeding of more efficient RNS cultivar and benefit the utilization of global nitrogen-fixing by legume plants.
Improvement of disease resistance in crops has great potential to increase productivity.Huge losses caused by pathogenic fungi, bacteria, nematodes, oomycetes, and viruses could be mitigated by breeding of disease-resistant cultivars.The 5 underutilized legume crops in this study are much less human-selected than other well-known legume crops such as soybean, thus, they may include more powerful resistance (R) genes.In this study, we identified all the R genes in the 5 studied plants, which can be divided mainly into two classes TNL and CNL.Notably, sword bean and winged bean have more CNL genes, but smooth rattlebox and butterfly pea have more TNL genes.
In future, these R genes can be transferred into major legume crops such as soybean to improve its disease resistance, which will reduce the application of chemical pesticide and promote the global food security.

Genome assembly and annotation
The contigs for sword bean, scarlet runner bean, and winged bean were assembled from PacBio HiFi reads utilizing HIFIASM version 0.16.1 [44] with parameter "-l 0", and the contigs for smooth rattlebox, and butterfly pea were assembled from PacBio HiFi reads utilizing HIFIASM version 0.19.5 [44] with parameter "-l 0".Contaminations were removed by aligning all contigs to chloroplast and mitochondria sequences downloaded from the NCBI database, using MINIMAP2 version 2.20 [45] with an identity >0.95 and coverage >0.95.The remaining contigs were used to represent the nuclear contigs, and the completeness was evaluated using BUSCO version 5.1.2[46] with ORTHODB version 10 (embryophyta lineage).Ultimately, the Hi-C reads were aligned to the nuclear contigs and Hi-C contact matrices among the contig bins were generated utilizing HIC-PRO version 3.1.0[47].Utilizing the Hi-C linkage information between contig ends, nuclear contigs (with sizes exceeding 1 Mb) were assembled into scaffolds at the chromosome level utilizing ENDHIC version 1.0 [48].
Tandem repeat elements (TRs) were detected using Tandem Repeats Finder (TRF) version 4.09 [49].Interspersed repeat elements (TEs) were identified through a three-step process: (1) The prediction of structurally intact transposon elements (TEs), including long-terminal-repeat retrotransposons (LTR-RTs), DNA transposon, Helitron, etc., was accomplished using EDTA version 1.9.9 [50].Concurrently, an intact TE library was generated.(http://www.repeatmasker.org).( 3) A de novo TE library was generated from the masked genome with all the above-mentioned identified TEs, using REPEATMODELER version 2.0.1 (http://www.repeatmasker.org/RepeatModeler), then the TE sequences in the library were classified using TERL v1.0 [51].Then, the classified TE sequences were used by REPEATMASKER to identify species-specific TEs in the genome.Ultimately, merging the overlapping coordinates and removing any redundancy was used to produce a non-redundant TE annotation.All TE elements larger than 80 bp in size of the scarlet runner bean and the winged bean, and those larger than 200 bp in size of the sword bean, the smooth rattlebox, and the butterfly pea were soft-masked (uppercase to lowercase) on their genome sequences for gene prediction.
From the orthogroups of ORTHOFINDER, the OG with Arachis hypogaea (recent WGD) [35] having one or two copies and other species having only one copy were selected, then a gene of Arachis hypogaea from duplicate genes was randomly thrown.Subsequently, all single-copy genes of all species were used to employ multiple sequence alignment (MSA) utilizing MUSCLE version v3.8.31 [60], and these MSA were combined to create a concatenate multiple sequence alignment (CMSA).Next, the CMSA was used to build a species tree utilizing RAXML version 1.0.3 [61] with parameters "--model GTR+G --tree pars --bs-trees 100 --outgroup Vitis_vinifera".
To estimate the divergence time, we used the RelTime branch method in MEGA11 [62] with one calibration time 8.0-19.5 million years ago between Phaseolus vulgaris and Vigna angularis and the other calibration time 47.7-56 million years ago between Glycine max and Arachis hypogaea.
The two calibration times were obtained from TimeTree (www.timetree.org).Subsequently, the expansion and contraction of gene families was identified using CAFE version 5 [63] with the parameter "-k5".
MCSCANX [64] was used to identify collinear gene blocks with more than five collinear genes.Synteny figures of the whole genome were plotted using the Java programs dual_synteny_plotter and dot_plotter from the MCSCANX package.According to the result of collinear genes, KaKs_CALCULATOR v2.0 [65] with GMYN model was employed to calculate the synonymous substitution rate (Ks) value for syntenic gene pairs.Chromosome collinearity among species was drawn using JCVI (https://github.com/tanghaibao/jcvi)with parameter --cscore=.99".

Analysis of genes involved in nitrogen-fixing root nodulation
The protein sequences associated with nitrogen-fixing nodulation were obtained from NCBI and Phytozome, and were aligned to the five Papilionoideae species using the 'blastp' algorithm in DIAMOND version 0.8.28 [55], with the parameters "--more-sensitive --evalue 0.00001".The Alignment results were further refined with a 50% identity and 60% coverage threshold.In this way, we identified the potential genes involved in nitrogen-fixing root-nodulation.
To analyze the phylogenetic relationships, the protein sequences within each gene family of NIN, RPG, CHS, and IFS were aligned independently using the MUSCLE version 3.8.31[60].

Figure6
Click here to access/download;Figure ;figure6.pdf the experimental design and statistical methods used should be given in the Methods section, as detailed in our Minimum Standards Reporting Checklist.Information essential to interpreting the data presented should be made available in the figure legends.Yes Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Have you included all the information requested in your manuscript?Resources A description of all resources used, including antibodies, cell lines, animals and software tools, with enough information to allow them to be uniquely identified, should be included in the Methods section.Authors are strongly encouraged to cite Research Resource Identifiers (RRIDs) for antibodies, model organisms and tools, where possible.Have you included the information requested as detailed in our Minimum Standards Reporting Checklist?Yes Availability of data and materials All datasets and code on which the conclusions of the paper rely must be either included in your submission or deposited in publicly available repositories (where available and ethically appropriate), referencing such data using a unique identifier in the references and in the "Availability of Data and Materials" section of your manuscript.Have you have met the above requirement as detailed in our Minimum Standards Reporting Checklist?Yes Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Title The genomes of five underutilized Papilionoideae crops provide insights into root nodulation and disease resistance Lihua Yuan 1,2,3,4 , Lihong Lei

Plant materials and sequencing
Commercial seed of sword bean (LVBAO cultivar) was obtained from NONGZHIZI SEEDS company (ZhuZhou, Hunan, China).Commercial seed of scarlet runner bean (Climbing cultivar) was obtained from JINMINXINNONG company (Fuzhou, Fujian, China).Commercial seed of winged bean (DUOXI-ginseng cultivar) was obtained from BOSITE agricultural technology company (Shenyang, Liaoning, China).Commercial seed of smooth rattlebox (Three-ellipse-leaf cultivar) was obtained from the forestry-bureau permitted seed store company (Jiaxing, Zhejiang, China).The young seedling of butterfly pea (Blue-flower cultivar) was obtained from LIUYI flowers and fruit seedlings company.The seeds were grown in plant growth chamber, and young leaves from a single plant for each species were used to extract genomic DNA using the Hi-DNAsecure Plant Kit (TIANGEN DP350, China).The genomic DNA was used to prepare 20-kb inserts sequencing library by SMRTbell Express Template Prep Kit 2.0 (PacBio, USA), and sequenced on Sequel II with the HiFi mode (PacBio, USA).The young leaves from the same plant were also used for Hi-C sequencing on the Illumina NovaSeq 6000 in PE150 mode.The roots, stems, leaves, and flowers of each species were sampled to extract total RNA using the RNeasy Plant Mini Kit (QIAGEN, Germany).The extracted RNA was pooled together for full-length cDNA sequencing on PacBio Sequel II with Iso-Seq mode (PacBio, USA).

( 2 )
Incomplete and homology TEs were detected against the above-mentioned intact TE library, Repbase database version 26.05 (plant lineage), and the Protein-coding TE database using REPEATMASKER version 4.1.2

Figure 2 .
Figure 2. TEs in the 5 sequenced species in this study.(A) Distribution of various types of transposable elements (TEs) in each species.(B) a scatter plot illustrating the correlation between the length of long terminal repeats (LTRs)LTR-RTs and the genome size.(C) Distribution of

Figure 3 .
Figure 3. Evolution of Papilionoideae.(A) Phylogentic tree with divergence time estimated by the RelTime branch method in mega.Two calibration constraints were used: one was 8.0-19.5 million years ago between Phaseolus vulgaris and Vigna angularis, and the other was 47.7-56 million years ago between Glycine max and Arachis hypogaea.The five sequenced species in this study are marked with blue stars.The numbers on the side of the nodes represent divergence time values, and the whole-genome polyploidization events are indicated on the branches in red.(B) Homologous Ks distribution within species, paralogous gene pairs situated on collinear fragments containing over five syntenic gene pairs are employed for Ks calculation using the GMYN model in the KaKS_CALCULATOR.(C) Macro-synteny plots among the 5 studied species.

Figure 4 .
Figure 4. Genomic comparison between Phaseolus coccineus (scarlet runner bean) and Phaseolus vulgaris (common bean).(A) Macro-synteny blocks between P. coccineus and P. vulgaris.Collinear fragments containing over 20 syntenic gene pairs are utilized.Pc and Pv represent P. coccineus and P. vulgaris respectively.(B) Distribution of LTR TEs, DNA TEs, other TEs, tandem repeats (TR), and Non-repeat regions in P. coccineus and P. vulgaris.(C) Overlap of the reference gene sets between P. coccineus and P. vulgaris.The protein sequences are aligned using Diamond with the parameters "--sensitive --evalue 1e-5".Genes that remain unaligned are considered as species-specific genes.

Figure 5 .
Figure 5. Root nodulation symbiosis.(A) Pathway of crucial genes involved in symbiotic nodulation and nitrogen fixation for Papilionoideae.The model figure is drawn using Figdraw.(B)The gene number of symbiotic pathway identified in C.gladiata, P. coccineus, P.tetragonolobus, C.pallida, and C.ternatea.The data source is from previous studies.(C) Gene tree for NIN.Members of the gene family were obtained from orthoFinder orthogroups, and the gene tree was constructed by FastTree.Bootstrap values are shown on each branch, and the two duplicated branches were highlighted.V. vinifera is utilized as the outgroup.(D) Gene tree for CHS, with similar style to NIN.The genes from the five sequenced species were shown in five different colors, and the tandem replicated genes of different species have been highlighted using distinct background colors.

Figure 6 .
Figure 6.Resistance (R) genes.(A) Model figure for three types of R genes TNL, CNL and RNL.TNL consists of TIR, NB-ARC, and LRR domains, CNL consists of CC, NB-ARC, and LRR domains, while RNL consists of RPW8, NB-ARC, and LRR domains from N-terminal to C-
https://doi.org/10.6084/m9.figshare.19196255) of winged bean (586 Mb), and found that our assembly has large advantage in resolving the repetitive centromere regions by using HiFi reads, while the Ma3 cultivar assembly has advantage in the chromosome-level scaffolding of contigs by using genetic map.The position of a chromosome segment in Chr06 of our assembly for winged bean was corrected according to the Ma3 cultivar assembly (Supplementary Fig.S4).In total, 51,158, 35,523, 40,081, 48,759 and 40,267 protein-coding gene models were predicted in the genome of sword bean, scarlet runner bean, winged bean, smooth rattlebox, and butterfly pea, respectively (Figure1, Supplementary TableS7-S9) , with.tThe coding regions covering 2.4-8.2% of the genome for each species 51 Mb (8.2%), 42 Mb (7.1%), 44 Mb (7.4%), 51 Mb (4.2%), and 42 Mb (2.4%) of the genome for each species.The BUSCO complete rates for the gene sets of these 5 species are comparable to those of BUSCO complete rates for the genomes, suggesting a high completeness of our gene annotation (Table 1).For function annotation, 72.11%,-89.0%,83.9%, 87.7% and 86.8% of genes in the 5 species were annotated by at least one of the NCBI-NR, KEGG, InterPro or GO databases (Supplementary Table